Semi-Supervised Active Clustering with Weak Oracles

نویسندگان

  • Taewan Kim
  • Joydeep Ghosh
چکیده

Semi-supervised active clustering (SSAC) utilizes the knowledge of a domain expert to cluster data points by interactively making pairwise “same-cluster” queries. However, it is impractical to ask human oracles to answer every pairwise query. In this paper, we study the influence of allowing “not-sure” answers from a weak oracle and propose algorithms to efficiently handle uncertainties. Different types of model assumptions are analyzed to cover realistic scenarios of oracle abstraction. In the first model, random-weak oracle, an oracle randomly abstains with a certain probability. We also proposed two distance-weak oracle models which simulate the case of getting confused based on the distance between two points in a pairwise query. For each weak oracle model, we show that a small query complexity is adequate for the effective k means clustering with high probability. Sufficient conditions for the guarantee include a γ-margin property of the data, and an existence of a point close to each cluster center. Furthermore, we provide a sample complexity with a reduced effect of the cluster’s margin and only a logarithmic dependency on the data dimension. Our results allow significantly less number of same-cluster queries if the margin of the clusters is tight, i.e. γ ≈ 1. Experimental results on synthetic data show the effective performance of our approach in overcoming uncertainties.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Relaxed Oracles for Semi-Supervised Clustering

Pairwise “same-cluster” queries are one of the most widely used forms of supervision in semi-supervised clustering. However, it is impractical to ask human oracles to answer every query correctly. In this paper, we study the influence of allowing “not-sure” answers from a weak oracle and propose an effective algorithm to handle such uncertainties in query responses. Two realistic weak oracle mo...

متن کامل

Active, semi-supervised learning to utilize human oracles

We present an approach to interactive machine learning, in which unlabeled data is employed in conjunction with active learning to better utilize the valuable resources that the human oracles provide. We empirically evaluate the approach in two very different applications, smartphone interruptibility prediction and semantic parsing. In both applications, we show that the use of active, semi-sup...

متن کامل

Extracting Prior Knowledge from Data Distribution to Migrate from Blind to Semi-Supervised Clustering

Although many studies have been conducted to improve the clustering efficiency, most of the state-of-art schemes suffer from the lack of robustness and stability. This paper is aimed at proposing an efficient approach to elicit prior knowledge in terms of must-link and cannot-link from the estimated distribution of raw data in order to convert a blind clustering problem into a semi-supervised o...

متن کامل

An Improved Semi-supervised Clustering Algorithm Based on Active Learning

In order to solve the difficult questions such as in the presence of the cluster deviation and high dimensional data processing in traditional semi-supervised clustering algorithm, a semi-supervised clustering algorithm based on active learning was proposed, this algorithm can effectively solve the above two problems. Using active learning strategies in algorithm can obtain a large amount of in...

متن کامل

A confidence-based active approach for semi-supervised hierarchical clustering

Semi-supervised approaches have proven to be effective in clustering tasks. They allow user input, thus improving the quality of the clustering obtained, while maintaining a controllable level of user intervention. Despite being an important class of algorithms, hierarchical clustering has been little explored in semisupervised solutions. In this report, we address the problem of semi-supervise...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • CoRR

دوره abs/1709.03202  شماره 

صفحات  -

تاریخ انتشار 2017